AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Neural Information Processing SystemsAug-14-2025, 04:58:59 GMT

33610fba262d7b6fed0810b89f55e147-Supplemental-Conference.pdf

graph convolutional network, non-parametric representation, representation, (13 more...)

Country:

North America > United States > Minnesota (0.05)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)

Industry:

Leisure & Entertainment > Sports > Skiing (0.47)
Health & Medicine (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-14-2025, 04:58:55 GMT

Multiview Human Body Reconstruction from Uncalibrated Cameras

Specifically, we map per-pixel image features to a canonical body surface coordinate system agnostic to views and poses using dense keypoints (correspondences). This feature mapping allows us to semantically, instead of geometrically, align and fuse visual features from multiview images.

estimation, multiview image, visual feature, (15 more...)

Country:

North America > United States > Minnesota (0.05)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Industry: Health & Medicine (0.42)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Lawson, Jared, Chitale, Rohan, Simaan, Nabil

Fluoroscopic Shape and Pose Tracking of Catheters with Custom Radiopaque Markers

arXiv.org Artificial IntelligenceJun-18-2025

--Safe navigation of steerable and robotic catheters in the cerebral vasculature requires awareness of the catheter's shape and pose. Currently, a significant perception burden is placed on interventionalists to mentally reconstruct and predict catheter motions from biplane fluoroscopy images. Efforts to track these catheters are limited to planar segmentation or bulky sensing instrumentation, which are incompatible with microcatheters used in neurointervention. In this work, a catheter is equipped with custom radiopaque markers arranged to enable simultaneous shape and pose estimation under biplane fluoroscopy. A design measure is proposed to guide the arrangement of these markers to minimize sensitivity to marker tracking uncertainty. Endovascular neurosurgery is a rapidly growing domain which enables treatment of cerebrovascular disease with minimally-invasive approaches. Among the most common endovascular neurointerventions include aneurysm coiling and mechanical thrombectomy (MT), which has become the gold standard for treating strokes caused by large vessel occlusions (L VOs).

artificial intelligence, catheter, fluoroscopy, (17 more...)

doi: 10.1109/LRA.2025.3581043

2506.09934

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Equipment & Supplies (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Neural Information Processing SystemsOct-7-2024, 10:10:59 GMT

Reviews: Unsupervised Learning of Shape and Pose with Differentiable Point Clouds

I maintain my original review and think the paper should be accepted. To get around the ambiguity of shape and pose, the authors propose to have an ensemble of pose predictors, which they distill post-training into a single model. I am inclined to accept the paper. The method is a solid solution to an interesting problem and the paper is well-written. In more detail: a) This is clearly a novel solution to an interesting but, so far, poorly explored problem.

differentiable point cloud, shape and pose, unsupervised learning, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.43)

Liao, Ziwei, Xu, Binbin, Waslander, Steven L.

Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors

arXiv.org Artificial IntelligenceOct-7-2024

Object-level mapping [1, 2, 3, 4, 5, 6, 7, 8, 9] builds a 3D map of multiple object instances in a scene, which is critical for scene understanding [10] and has various applications in robotic manipulation [11], semantic navigation [12, 13] and long-term dynamic map maintenance [14]. It addresses two closely coupled tasks: 3D shape reconstruction [15, 16] and pose estimation [17]. Conventional methods [18, 19, 20] approach these tasks from a perspective of state estimation [21], solving an inverse problem where low-dimensional observations (RGB and Depth images) are used to recover high-dimensional unknown variables (3D poses and shapes) through a known observation process (e.g., projection, and differentiable rendering). However, these methods require dense observations (e.g., hundreds of views for NeRF [18]) to fully constrain the problem. In robotics or AR applications, obtaining such dense observations is challenging due to limitations in the robot's or user's observation angle and occlusions in clustered scenarios. Therefore, it is crucial to develop methods that can map from sparse (fewer than 10) or even single observations. Human vision can infer complete 3D objects from images despite occlusions by using prior knowledge of the objects, which represents the prior distributions of the shapes of specific categories, such as chairs, based on thousands of instances observed in daily life. We aim to introduce generative models [22] as providers of prior knowledge to constrain the 3D object mapping. Generative models have demonstrated impressive abilities to generate high-quality multi-modal data by learning distributions in large-scale datasets, including texts [23], images [24], videos [25], and 3D models [26, 27, 28, 29].

category, constraint, diffusion model, (14 more...)

2410.05514

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJul-17-2024

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

Li, Jianhao, Sun, Tianyu, Wang, Zhongdao, Xie, Enze, Feng, Bailan, Zhang, Hongbo, Yuan, Ze, Xu, Ke, Liu, Jiaheng, Luo, Ping

This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quality instance masks from the prompts using the Segment Anything Model (SAM) and transform the remaining problem into predicting 3D shapes from given 2D masks. Due to the ill-posed nature of this problem, it presents a significant challenge as multiple 3D shapes can project into an identical mask. To tackle this issue, we then lift 2D masks to 3D forms and employ gradient descent to adjust their poses and shapes until the projections fit the masks and the surfaces conform to surrounding LiDAR points. Notably, since we do not train on a specific dataset, the SLF auto-labeler does not overfit to biased annotation patterns in the training set as other methods do. Thus, the generalization ability across different datasets improves. Experimental results on the KITTI dataset demonstrate that the SLF auto-labeler produces high-quality bounding box annotations, achieving an AP@0.5 IoU of nearly 90\%. Detectors trained with the generated pseudo-labels perform nearly as well as those trained with actual ground-truth annotations. Furthermore, the SLF auto-labeler shows promising results in detailed shape predictions, providing a potential alternative for the occupancy annotation of dynamic objects.

detection, point cloud, slf, (14 more...)

2407.11382

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)

Ferraro, Stefano, Van de Maele, Toon, Mazzaglia, Pietro, Verbelen, Tim, Dhoedt, Bart

Disentangling Shape and Pose for Object-Centric Deep Active Inference Models

arXiv.org Artificial IntelligenceSep-16-2022

Active inference is a first principles approach for understanding the brain in particular, and sentient agents in general, with the single imperative of minimizing free energy. As such, it provides a computational account for modelling artificial intelligent agents, by defining the agent's generative model and inferring the model parameters, actions and hidden state beliefs. However, the exact specification of the generative model and the hidden state space structure is left to the experimenter, whose design choices influence the resulting behaviour of the agent. Recently, deep learning methods have been proposed to learn a hidden state space structure purely from data, alleviating the experimenter from this tedious design task, but resulting in an entangled, non-interpreteable state space. In this paper, we hypothesize that such a learnt, entangled state space does not necessarily yield the best model in terms of free energy, and that enforcing different factors in the state space can yield a lower model complexity. In particular, we consider the problem of 3D object representation, and focus on different instances of the ShapeNet dataset. We propose a model that factorizes object shape, pose and category, while still learning a representation for each factor using a deep neural network. We show that models, with best disentanglement properties, perform best when adopted by an active agent in reaching preferred observations.

artificial intelligence, category, machine learning, (19 more...)

2209.09097

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceSep-6-2022

Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors

Wang, Xi, Li, Gen, Kuo, Yen-Ling, Kocabas, Muhammed, Aksan, Emre, Hilliges, Otmar

We present a method for inferring diverse 3D models of human-object interactions from images. Reasoning about how humans interact with objects in complex scenes from a single 2D image is a challenging task given ambiguities arising from the loss of information through projection. In addition, modeling 3D interactions requires the generalization ability towards diverse object categories and interaction types. We propose an action-conditioned modeling of interactions that allows us to infer diverse 3D arrangements of humans and objects without supervision on contact regions or 3D scene geometry. Our method extracts high-level commonsense knowledge from large language models (such as GPT-3), and applies them to perform 3D reasoning of human-object interactions. Our key insight is priors extracted from large language models can help in reasoning about human-object contacts from textural prompts only. We quantitatively evaluate the inferred 3D models on a large human-object interaction dataset and show how our method leads to better 3D reconstructions. We further qualitatively evaluate the effectiveness of our method on real images and demonstrate its generalizability towards interaction types and object categories.

category, human-object interaction, interaction, (16 more...)